Map: Race, Income, and Appraisal Value in Illinois

An appraisal is the process by which a lender determines the value of a home. Bias often enters this process when appraisers undervalue homes due to the race of the current homeowner or their neighbors, preventing homebuyers from receiving loans large enough to cover the contract price of a house.

In this analysis, appraisal equity in Illinois is evaluated through the use of two key variables from the Federal Housing Finance Agency (FHFA) Uniform Appraisal Dataset (UAD):

Demographic data from the US Census Bureau are also included for comparison.

Please note that FHFA data were suppressed for about 23% of census tracts due to privacy concerns, and thus these tracts are not included in this analysis.

Click here to show the process used to produce the map below in R

Loading packages

The following packages must be loaded in order for this script to run:

library(tidyverse)
library(tidycensus)
library(sf)
library(tmap)
library(leaflet)
library(plotly)
library(GGally)

Extracting data

Federal Housing Finance Agency (FHFA) Uniform Appraisal Dataset (UAD)

Data from the UAD are downloaded from the FHFA datasets page. Tract-level data are used to show differences in appraisal values between local areas.

After data are downloaded, the following operations are done to reduce the size and increase the relevance of the dataset:

  1. Only tracts in Illinois, the study area, are included.
  2. Only data on the percentage of appraisals below contract price and the count of appraisals are included.
  3. Only data from 2018 onward are included (to allow for averaging later).
  4. Only data on home purchases (rather than refinancing) are included.
  5. Unnecessary variables are dropped.
  6. A 5-year average, from 2018 to 2022 (inclusive), is taken to show the most recent appraisal data, while also ensuring that a long enough period is used to reduce noise.

Note that the following code chunk is not run in this R Markdown (hence the # before each line) in order to save time, as the output file only must be created once.

## Load data, filter data, and drop unnecessary variables -- COMMENTED, AS THIS ONLY NEEDS TO BE RUN ONCE

# fhfa_uad_trct = read.csv("FHFA_UAD_tract/FHFA_UAD_tract.csv") %>%
#   filter(STATEPOSTAL == "IL" & YEAR >= 2018 & SERIES %in% c("% of Appraisals Below Contract Price", "Count of Appraisals")) %>%
#   select(SERIES, GEOID = TRACT, YEAR, VALUE, PURPOSE)

## Separate out the first variable and take the 5-year average
# fhfa_uad_trct_pctbw = filter(fhfa_uad_trct, SERIES == "% of Appraisals Below Contract Price") %>%
#   select(-SERIES, -PURPOSE) %>%
#   group_by(GEOID) %>%
#   summarise(pctbw = mean(VALUE, na.rm = T)) %>%
#   mutate_at(vars(pctbw), ~ifelse(is.nan(.), NA, .))

## Separate out the second variable and take the 5-year average
# fhfa_uad_trct_ctapr = filter(fhfa_uad_trct, SERIES == "Count of Appraisals") %>%
#   filter(PURPOSE == "Purchase") %>%
#   select(-SERIES, -PURPOSE) %>%
#   group_by(GEOID) %>%
#   summarise(ctapr = mean(VALUE, na.rm = T)) %>%
#   mutate_at(vars(ctapr), ~ifelse(is.nan(.), NA, .))

## Merge the three tables back together
# fhfa_uad_trct1 = merge(fhfa_uad_trct_pctbw, fhfa_uad_trct_ctapr, by = "GEOID")

## Export as a .csv
# write.csv(fhfa_uad_trct1, "FHFA_UAD_tract/FHFA_UAD_tract_filtered.csv")

The following code chunk simply opens up the file created above, removes an unnecessary index variable, and converts the GEOID variable (census tract code) to string format to allow for merging with census data later on.

# Load cleaned CSV created above
fhfa_uad_trct = read.csv("FHFA_UAD_tract/FHFA_UAD_tract_filtered.csv") %>%
  select(GEOID, pctbw, ctapr)

# Change GEOID to character data type in order to allow for merging later on
class(fhfa_uad_trct$GEOID) = "character"

2020 census data

2020 census data on population and race are downloaded directly from the census bureau using the tidycensus R package. Variables fro the 2020 Decennial Census are total population (P1_001N), total non-Hispanic/Latino white population (P2_005N), total Hispanic/Latino population (P2_002N), and total non-Hispanic/Latino Black population (P2_006N). One final variable, median income (B19326_001), is taken from a 5-year average of the American Community Survey from 2017-2021 (inclusive).

# Load decennial variables
il_trct_dec = get_decennial(
  geography = "tract",
  variables = c(popul = "P1_001N", white = "P2_005N", hispa = "P2_002N", black = "P2_006N"),
  state = "IL",
  year = 2020,
  geometry = T,
  output = "wide"
) %>%
  filter(!st_is_empty(geometry)) # REMOVE EMPTY TRACTS!
## Getting data from the 2020 decennial Census
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the PL 94-171 Redistricting Data Summary File
## Note: 2020 decennial Census data use differential privacy, a technique that
## introduces errors into data to preserve respondent confidentiality.
## i Small counts should be interpreted with caution.
## i See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
## This message is displayed once per session.
# Load ACS variable
il_trct_acs = get_acs(
  geography = "tract",
  variables = c(mdinc = "B19326_001"),
  state = "IL",
  year = 2021,
  survey = "acs5",
  geometry = T,
  output = "wide"
) %>%
  filter(!st_is_empty(geometry)) %>% # REMOVE EMPTY TRACTS!
  st_drop_geometry() %>% # REMOVE GEOMETRY (ALREADY PROVIDED FROM DECENNIAL VARIABLES)
  select(GEOID, mdinc = mdincE) # REMOVE OTHER UNNECESSARY VARIABLES
## Getting data from the 2017-2021 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

Transforming data

First, the census tables and appraisal data are merged.

# Merge tables
il_trct = full_join(il_trct_dec, il_trct_acs, by = "GEOID") %>%
  full_join(fhfa_uad_trct, by = "GEOID")

Next, each race variable is converted to a percent of the total tract population, then multiplied by 100 (this is needed in order to visualize the values as a percentage later on)

# Calculate percentages, then multiply these values and pctbw by 100
il_trct = il_trct %>% mutate_at(vars(c(white, hispa, black)), ~ (./popul)) %>%
  mutate_at(vars(c(white, hispa, black, pctbw)), ~ (.*100))

To clean up, non-integer values are rounded and variables are relocated in a more sensible order.

# Round non-integer values
il_trct$white = round(il_trct$white, digits = 1)
il_trct$hispa = round(il_trct$hispa, digits = 1)
il_trct$black = round(il_trct$black, digits = 1)
il_trct$pctbw = round(il_trct$pctbw, digits = 1)
il_trct$ctapr = round(il_trct$ctapr, digits = 1)

# Rearrange variables
il_trct = il_trct %>%
  relocate(mdinc, .after = black) %>%
  relocate(pctbw, .after = mdinc) %>%
  relocate(ctapr, .after = pctbw)

Finally, alternate versions of this table are created, the first with no NA values, and the second with no NA or 0 values that is converted to logarithmic format. The first will allow for better map visualizations, while the second will allow for linear regressions with logarithmic transformation.

# No NA values
il_trct_noNA = na.omit(il_trct)

# No NA or 0 values - logarithmic transformation
il_trct_log = filter(il_trct_noNA, popul > 0 & white > 0 & hispa > 0 & black > 0 & pctbw > 0) %>%
  mutate_at(vars(white, hispa, black, mdinc, pctbw, ctapr), ~ log(.))

Mapping data

In order to create interactive maps, package tmap is set to interactive viewing mode and labels with percentage and dollar symbols are created.

# Set tmap to interactive viewing
tmap_mode("view")
## tmap mode set to interactive viewing
# Create "%" and "$" string variables for labeling
il_trct_noNA = il_trct_noNA %>%
  mutate(white_pct = paste0(sprintf("%.1f", white), "%")) %>%
  mutate(hispa_pct = paste0(sprintf("%.1f", hispa), "%")) %>%
  mutate(black_pct = paste0(sprintf("%.1f", black), "%")) %>%
  mutate(mdinc_usd = paste0("$", as.character(format(mdinc, big.mark = ",", trim = T)))) %>%
  mutate(pctbw_pct = paste0(sprintf("%.1f", pctbw), "%"))

Next, the maps are defined as an object, using data from the “no NA” version of the data table. This object is then converted to a leaflet map in order to change a few settings.

# Create popup label object
popup = c("2020 Population" = "popul", "2020 Percent Non-Hispanic/Latino White" = "white_pct", "2020 Percent Non-Hispanic/Latino Black" = "black_pct", "2020 Percent Hispanic/Latino" = "hispa_pct", "2017-2021 Median Income" = "mdinc_usd", "2018-2022 Percent of Appraisals Below Contract Price" = "pctbw_pct", "2018-2022 Annual Count of Appraisals" = "ctapr")

# Create map object
map = tm_shape(il_trct_noNA, name = "2020 Population by<br>Census Tract") +
  tm_polygons(title = "2020 Population by<br>Census Tract", col = "popul", style = "jenks", palette = "Greys", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup) +
  tm_shape(il_trct_noNA, name = "2020 Percent Non-<br>Hispanic/Latino White") +
  tm_polygons(title = "2020 Percent Non-<br>Hispanic/Latino White", col = "white", style = "jenks", palette = "Reds", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
  tm_shape(il_trct_noNA, name = "2020 Percent Non-<br>Hispanic/Latino Black") +
  tm_polygons(title = "2020 Percent Non-<br>Hispanic/Latino Black", col = "black", style = "jenks", palette = "Oranges", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
  tm_shape(il_trct_noNA, name = "2020 Percent<br>Hispanic/Latino") +
  tm_polygons(title = "2020 Percent<br>Hispanic/Latino", col = "hispa", style = "jenks", palette = "YlOrBr", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
  tm_shape(il_trct_noNA, name = "2017-2021 Median<br>Income") +
  tm_polygons(title = "2017-2021 Median<br>Income", col = "mdinc", style = "jenks", palette = "Greens", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0("$", x))) +
  tm_shape(il_trct_noNA, name = "2018-2022 Percent of<br>Appraisals Below<br>Contract Price") +
  tm_polygons(title = "2018-2022 Percent of<br>Appraisals Below<br>Contract Price", col = "pctbw", style = "jenks", palette = "Blues", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
  tm_shape(il_trct_noNA, name = "2018-2022 Annual<br>Count of Appraisals") +
  tm_polygons(title = "2018-2022 Annual<br>Count of Appraisals", col = "ctapr", style = "jenks", palette = "Purples", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup)

# Convert to leaflet
map_leaflet = tmap_leaflet(map)

# Make layers overlay and deselect all layers but one by default
map_leaflet1 = map_leaflet %>% addLayersControl(overlayGroups = c("2020 Population by<br>Census Tract", "2020 Percent Non-<br>Hispanic/Latino White", "2020 Percent Non-<br>Hispanic/Latino Black", "2020 Percent<br>Hispanic/Latino", "2017-2021 Median<br>Income", "2018-2022 Percent of<br>Appraisals Below<br>Contract Price", "2018-2022 Annual<br>Count of Appraisals")) %>%
  hideGroup(c("2020 Population by<br>Census Tract", "2020 Percent Non-<br>Hispanic/Latino White", "2020 Percent Non-<br>Hispanic/Latino Black", "2020 Percent<br>Hispanic/Latino", "2017-2021 Median<br>Income", "2018-2022 Annual<br>Count of Appraisals"))


Use the map below to explore how demographic factors such as race and income are associated with appraisal values.

Hover over the layers menu on the right side of the map and use the check boxes within to select one or more data layers to view. You can also click on an individual tract to get more information about it.


Regressions

Click here to show regressions of the data

Regressions

Two correlograms are created with the package GGally, one with linear data (il_trct_noNA) and one with logarithmic data (il_trct_log)

il_trct_noNA_correlogram = il_trct_noNA %>% select(white, hispa, black, mdinc, pctbw, ctapr) %>% st_drop_geometry() %>% ggpairs(title="Correlogram of all variables", lower = list(continuous = wrap("smooth_loess", alpha = 0.2, size = 0.05, color = "blue")))

il_trct_log_correlogram = il_trct_log %>% select(white, hispa, black, mdinc, pctbw, ctapr) %>% st_drop_geometry() %>% ggpairs(title="Correlogram of all variables with logarithmic transformation", lower = list(continuous = wrap("smooth_loess", alpha = 0.2, size = 0.05, color = "blue")))

The two correlograms are visualized below:



NHS Logo

For more information contact: